Material for “ Application of the Bayesian MMSE Estimator for Classification Error to Gene - Expression Microarray Data ” Lori
نویسندگان
چکیده
The introduction of our paper discusses the leave-one-out and cross-validation error estimators. In our implementation of cross-validation, we use k = 5 folds and 5 repetitions, each with different partitions. The basic bootstrap zero estimator, ε̂b0, [3], [4] generates B bootstrap samples, each consisting of n equally-likely draws with replacement from the original sample of size n. Each bootstrap sample is then used to design a surrogate classifier, and the points left out of the bootstrap sample are used as holdout to estimate the error of the surrogate classifier. The bootstrap zero estimator is the average of these errors. Like cross-validation, this error estimator is randomized because of the randomly selected bootstrap samples and tends to be pessimistic because the expected bootstrap sample size is only 0.632n. In our simulations, we use the popular .632 bootstrap error estimator with B = 100, which attempts to correct the pessimistic bias of the bootstrap zero estimator with optimistically biased resubstitution. In particular,
منابع مشابه
Application of the Bayesian MMSE estimator for classification error to gene expression microarray data
MOTIVATION With the development of high-throughput genomic and proteomic technologies, coupled with the inherent difficulties in obtaining large samples, biomedicine faces difficult small-sample classification issues, in particular, error estimation. Most popular error estimation methods are motivated by intuition rather than mathematical inference. A recently proposed error estimator based on ...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملFeature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملExpression Profiling of Microarray Gene Signatures in Acute and Chronic Myeloid Leukaemia in Human Bone Marrow
Background Classification of cancer subtypes by means of microarray signatures is becoming increasingly difficult to ignore as a potential to transform pathological diagnosis nonetheless, measurement of Indicator genes in routine practice appears to be arduous. In a preceding published study, we utilized real-time PCR measurement of Indicator genes in acute lymphoid leukaemia (ALL) and acute m...
متن کامل